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Abstract 

We consider a random sparse graph with bounded average degree, in which a subset of 
vertices has higher connectivity than the background. In particular, the average degree inside 
this subset of vertices is larger than outside (but still bounded). Given a realization of such 
graph, we aim at identifying the hidden subset of vertices. This can be regarded as a model 
for the problem of finding a tightly knitted community in a social network, or a cluster in a 
relational dataset. 

In this paper we present two sets of contributions: (i) We use the cavity method from spin 
glass theory to derive an exact phase diagram for the reconstruction problem. In particular, as 
the difference in edge probability increases, the problem undergoes two phase transitions, a static 
phase transition and a dynamic one. (ii) We establish rigorous bounds on the dynamic phase 
transition and prove that, above a certain threshold, a local algorithm (belief propagation) 
correctly identify most of the hidden set. Below the same threshold no local algorithm can 
achieve this goal. However, in this regime the subset can be identified by exhaustive search. 

For small hidden sets and large average degree, the phase transition for local algorithms takes 
an intriguingly simple form. Local algorithms succeed with high probability for deg;,, — deg^^j > 
y/degom /e and fail for degj^ - deg^^t < v^degyP/e (with degi„, deg^^t the average degrees 
inside and outside the community). We argue that spectral algorithms are also ineffective in the 
latter regime. It is an open problem whether any polynomial time algorithms might succeed for 

degi„ - degout < \/^ out /^* 


1 Introduction 

1.1 Motivation 

The problem of finding a highly connected subset of vertices in a large graph arises in a number 
of applications across science and engineering. Within social network analysis, a highly connected 
subset of nodes is interpreted as a community |Forin| . Many approaches to data clustering and 
dimensionality reduction construct a ‘similarity graph’ over the data points. A highly connected 
subgraph corresponds to a cluster of similar data points [VLn7| . 

A closely related problem arises in the analysis of matrix data, e.g. in microarray data analysis. 
In this context, researchers are often interested in a submatrix whose entries have an average value 
larger (or lower) than the rest |SWPN09] . Such an anomalous submatrix is interpreted as evidence 
of association between gene expression levels and phenotypes (e.g. medical conditions). If we 
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consider the graph adjacency matrix, a highly connected subset of vertices corresponds indeed to 
a principal submatrix with average value larger than the background. 

The special case of finding a completely connected subset of vertices (a clique) in a graph has 
been intensely studied within theoretical computer science. Assuming P^^NP, the largest clique in 
a graph cannot be found in polynomial time. Even a very rough approximation to its size is hard 
to find |Has96l [KhoOl] . In particular, it is hard to detect the presence of a clique of size in a 

graph with N vertices. 

Such hardness results motivated the study of random instances. In particular, the so-called 
‘planted clique’ or ‘hidden clique problem’ |.Ier92| requires to find a clique of size k that is added 
(planted) in a random graph with edge density 1/2. More precisely, for a subset of vertices S C 
[N], all edges (i,j), with {i,j} C S are present. All other edges are present independently with 
probability 1/2. Such a clique can be found reliably by exhaustive search as soon as k > 2(1 -|- 
e) log 2 iV [GM75] . However, despite many efforts, no algorithm is known that achieves this goal 
for k <C \//V [AKS981 IFKOOl IDGGP14] . In other words, the problem of finding cliques of size 
2 log 2 A" <C A: <C Vn is solvable, but possibly hard. Proving that indeed it is computationally hard 
to find cliques in this regime is an outstanding problem in theoretical computer science. 

For general polynomial algorithms, it is known since |AKS98j that a clique of size 5y/N can be 
found in time for any 5 > 0 fixed. Hence, if we allow any time complexity polynomial 

in A, then the question is whether the planted clique can be found for k = o{y/N). 

A more stringent computational constraint requires that the clique is found in nearly-linear 
time, i.e. in time of order 0(A^(log A)'’). Note that the number of bits required to encode 
an instance of the problem is of order A^, so A^(logA)'’ is a logarithmic multiple of the time 
required to read an instance. Dekel, Gurel-Gurevittch and Peres |DGGP14] developed a linear¬ 
time algorithm (i.e. with complexity 0(A^)) that finds the hidden clique with high probability, 
provided k > 1.261-v/A- In |DM14b| it was proved that, if /c > (1 -|- e)y^A/e, then there exists a 
message passing algorithm that finds with high probability the clique with 0(A^ log A) operations. 
The same paper provided evidence that a certain class of ‘local’ algorithms fails at the same 
threshold. Among other motivations, the present paper generalizes and supports the existence of 
a fundamental threshold for local algorithms -at least in the sparse graph setting. 

1.2 Rigorous contributions 

In the present paper, we consider the problem of finding a highly connected subset of vertices in 
a sparse graph, i.e. in a graph with bounded average degree. In this case, the hidden set size must 
scale linearly with A to obtain a non-trivial behavior. Somewhat surprisingly, we find that the 
phase transition ‘at 1 /^/e^ leaves a trace also in the sparse regime. 

More precisely, we consider a random graph generated as follows. We select a subset of vertices S 
of size kN, uniformly at random given its size. We connect any two vertices in the set independently 
with probability a/N. Any other edge is added independently with probability 6/A, b < a. The 
problem distribution is therefore parametrized by a, 6, k G M and we will be therefore interested in 
the limit A —>■ oo, with a, b, k fixed. A more intuitive parametrization is obtained by replacing a, 6 
with the average degrees for vertices i G S, and i ^ S denoted, respectively, by degi^, deg^ut 

Our main rigorous result is a sharp phase transition in the following double asymptotics: 

• First A —7- oo. This corresponds to considering large graphs. 
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• Then k —>■ 0 and degjn,dego^^- —)• oo. This corresponds to focusing on small hidden sets, but 
still linear in N. The requirement degj^, degQ^t —)• oo is a necessary consequence of k —)■ 0: it 
can be shown that otherwise the hidden set cannot possibly be detected. 

Our main rigorous result (Theorem establishes that, in the above double asymptotics, a 
phase transition takes place for local algorithms at 


degin - degout 




degput 

e 


( 1 ) 


Namely, we consider the problem of testing whether a vertex z is in S' or not. We say that such 
a test is reliable if, in the above limit, the fraction of incorrectly estimated vertices vanishes in 
expectation. Then: 

• For degjn — degput > (1 + e)-\/degout /e, a local algorithm can estimate reliably S in time of 
the order of the number of edges. This is achieved for instance, by the belief propagation 
algorithm. 

• For degjn “ degout < (1 “ e)y/degout /e, no local algorithm can reliably reconstruct S. 


Analogously to the classical hidden clique problem, there is a large gap between what can be 
achieved by local algorithms, and optimal estimation with unbounded computational resources. 
Proposition 4.1 estabilishes that exhaustive search will find S in exponential time, as soon as 
degjn “ degout > e-\/degout /c for some positive e (in the same double limit). 

Note that, in both cases, a small fraction of the vertices in S remains undetected because of the 
graph sparsity, for any degjn < oo. In particular, the number of vertices of degree 0 is linear in N, 
and such nodes cannot be identified. 

Let us finally mention the degree of a vertex i is a Poisson with mean degjn if * £ •S' and mean 
degout if * 0 *5'. Hence, the degree standard deviation (outside S) is y^deg))))^. Therefore, the ratio 
(degjn — degout)/'\/dogout is the difference in mean degree divided by the standard deviation, and 
has the natural interpretation of a ‘signal-to-noise ratio.’ 


1.3 Non-rigorous contributions 

While our rigorous analysis focuses on the limit —)• 0 and degjn, degout oo (after N —)• oo), we 
will use the cavity method from spin glass theory to investigate the model behavior for arbitrary 
degjn, degout) ^ (fo tde N ^ oo limit) or, equivalently, arbitrary a, b, n. 

We will use two approaches to obtain concrete predictions from the cavity method: 

• For general a, b (bounded degree), we derive the cavity predictions for local quantities, as well 
as for the free energy density. We show that this indeed coincide (up to a shift) with the 
mutual information per variable between the hidden set S and the observed graph G. 

We use the ‘population dynamics’ (or ‘sampled density evolution’) algorithm [MPOll IB,11081 
IMMOQj to solve numerically the cavity equations. 

• We then consider the limit of large a, b (large degree) for arbitrary k. In order to obtain a 
non-trivial limit, the signal-to-noise ratio A = «:^(a — 6)^/[(l — K)b] is kept fixed in this limit, 
together with k G [0,1]. 
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The cavity equations simplify in this limit (the cavity field distributions become Gaussian), 
and we can derive an exact phase diagram, without recourse to intensive numerical methods, 
cf. Figure 

This two approaches are complementary in that the large-degree asymptotics yields closed-form 
expressions. The qualitative features of the resulting phase diagram should remain unchanged at 
moderately small values of a, b. Our population dynamics analysis conhrms this. 

As already mentioned, one of the motivations for the present work was to better understand 
the computational phase transition discovered in |DM14bj for the classical hidden clique problem. 
For background edge density 1/2, this takes place when the size of the hidden clique is fc ~ \J~N~fe. 
This phase transition can indeed be formally recovered as a dense limit of the results presented in 
this paper. 

More precisely, the phase transition for hidden cliques |DM14b| is captured by Eq. Q, once we 
rewrite the latter in terms varout) the variance of the degrees of nodes i ^ S. In the sparse regime, 
the degree is approximately Poisson distributed, and hence varout ~ degouf Therefore Eq. 0 can 
be rewritten as (degj„ — degout)/\/vhfout = l/\/® For the classical (dense) hidden clique problem, 
we have degout = (-^ “ l)/^! degj^ = {N + k — 2)/2 and varout = (A" — l)/4, and hence we recover 
the condition k ~ N/e. 

Erom a different perspective, the present work offers a statistical mechanics interpretation of 
the phase transitions in the hidden clique problem. Namely, the latter can be formally recovered 
as the K —)• 0 limit of the phase diagram in Eigurej^ below. In particular, the computational phase 
transition at /c = je corresponds to a dynamical phase transition (a spinodal point) in the 
underlying statistical mechanics model. 


1.4 Paper outline 


The rest of the paper is organized as follows. In the next section we define formally our model and 
some related notations. Section [^derives the phase diagram using the cavity method. In particular, 
we show that the model undergoes two phase transitions as the signal-to-noise ratio increases (for 
fe/A small enough): a static phase transition and a dynamic one. The two phase transitions are well 
separated. Section presents rigorous bounds on the behavior of local algorithms and exhaustive 
search, that match the above phase transitions for small fc/A. This section is self-contained and the 
interested reader can move directly to it, after the model definition (some useful, but elementary 
results are presented in Section 3.3). Proofs are deferred to the appendix. Einally, Section]^ 


positions our results in the context of recent literature. 

Several research communities have been working on closely related problems: statistical physics, 
theoretical computer science, machine learning, statistics, information theory. We tried to write 
a paper that could be accessible to researchers with different backgrounds, both in terms of tools 
and of language. We apologize for any redundancy that might have followed from this approach. 


Notations 

We use \i] = {1,... to denote the set of first I integers, and |A| to denote the size (cardinality) 
of set A.) Eor a set V, we write (i,j) C E to indicate that (i,j) runs over all unordered pairs of 
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(2) 


distinct elements in V. For instance, for a symmetric function F{i,j) = F{j,i), we have 

n F{i,j)^ n F{i,j). 

If instead Fi is a set of edges over the vertex set V (unordered pairs with elements in V) we write 
{i,j) G E to denote elements of E. 

We use N(;U,(T^) to denote the Gaussian distribution with mean /r and variance Other 
classical probability distributions are denoted in a way that should be self-explanatory (Bernoulli(p), 
Poisson(c), and so on). 


2 Model definition 


We consider a random graph Gn = {Vn, E^) with vertex set Vn = [A^] = N} and random 

edges generated as follows. A set 5 C Vjy is chosen at random. Introducing the indicator variables 


Xi = 



itiGS, 

otherwise, 


we let Xi G {0,1} independently with 

P(xj = l) = K . 


(3) 

(4) 


In particular |S| is a binomial random variable, and is tightly concentrated around its mean ElSj = 
kN. Edges are independent given S, with the following probability for i,j G Vn distinct: 





if {i,j} C S, 
otherwise. 


(5) 


We let X = (aji,... ,xn) denote the vector identifying S. By using Bayes theorem, the condi¬ 
tional distribution of x given G is easily written 


pcix) = P(x|G) = 


1 


2(G) .JI, 


I — K 


n 

(hi)c[iv] 


I — a 


/Ny 


l-b/NJ 


n 


( 6 ) 


where pN = («/^)((l ~ b/N)/{l — a/N)'j. 

We next replace the last probability distribution with one that is equivalent as N —)• oo, and 
slightly more convenient for the cavity calculations of the next section. (These simplifications will 
not be used to prove the rigorous bounds in Section]^) We hrst note that, as A" —)> oo, we have 
Pn ^ P with 


a 



(7) 


Next, letting |x| = 

'I — a/N\^i^3 


n 


1 - b/N 


we can rewrite the second product in Eq. (|^ as 

I - a/N\ ('2') _ ^ /1 - a/N\C^^^-m^\-^N)/2 
l-b/N) ~^\l-b/N) 


1 - a/NYiM-^Nf 

l-b/N) 

( 8 ) 
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with 


^ - \l^) ’ 

a constant independent of x. Notice that |x| ~ Binom(N', k) is tightly concentrated around IE{|x|} = 


kN. In particular |x| = kN + 0{-\fN) with high probability, and therefore the last term in Eq. (|8 
is of order 0(1). We will neglect it, thus obtaining 


n 

{bi)c[JV] 


1 - a/N\^i^j 

1 - h/N 


c 


1 — 

l-h/NJ 


(J" g-K(a-b)|a:| ^ 


( 10 ) 


The error incurred by neglecting the last term in Eq. ([^ can be corrected by considering the 
following approximate conditional distribution of x given the graph G 

1 


pg{x) = 


Z{G) 




{i,j)eE i&V 
where ]I(vl) is the indicator function on condition A and 

7 = 


Xi = kn 

ieVN 


1 — K 


( 11 ) 


( 12 ) 


Note that we multiplied pd ') by the indicator function I ~ . This can be interpreted 

as replacing the i.i.d. Bernoulli distribution Q with the uniform distribution over S with l^l = k,N, 
which is immaterial as long as local properties of Pg{x) are considered. 

In the following, we shall compare different reconstruction methods. Any such method corre¬ 
sponds to a function Ti{G) G {0,1} of vertex i and graph G, with the interpretation 


T^{G) = 


1 if i is estimated to be in S, 

0 if i is estimated not to be in S. 

We characterize such a test through its rescaled success probability 

vidAT) = P(r,(G) = i\iGS)+ p(r,(G) = o|i 0 5) -1. 


(13) 


(14) 

Note that a trivial test (assigning Ti{G) G {0,1} at random independently of G) achieves Pi^)^;l(T) = 
0, while a perfect test has Pi^c(r) = 1. We shall often omit the arguments T, n from PiAJc{T) in 
the following. 

We note in passing that the optimal estimator with respect to the metric (14) is the maximum- 
likelihood estimator 


Td\G) = 


(15) 


if P(G|i G 5) > P(G|z 0 S), 
if P(G|i eS) < P(G|z 0 S). 

Namely, for any other estimator T, we have PiucciT) < Ps^)J;c(r°P*) (see, for instance, the textbook 
|LC98j for a proof of this fact). The resulting success probability coincides with the total variation 
distance between the conditional distribution of G given the two hypotheses i G S and i ^ S. Recall 
that, given two probability measures p, q on the same finite space fl, their total variation distance 
is defined as ||p( •) “ ?(' )IItv = (1/2) Ylujen \pi^) ~ Then we have 

PinliT) < = l|T(G G -K G 5) > P(G G -I* 0 5)||,v. (16) 
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3 Phase transitions via cavity method 


In this section we use the cavity method to derive an exact phase diagram of the model. It is 
convenient to introduce the following signal-to-noise-ratio parameter: 

X ^ (degput - degj^)^ 

- (l-Ac)degpp, ■ ^ ^ 

Using the fact that the degree outside S is Poisson with mean deg^^^ = b, and inside is Poisson 
with mean degj^ = kq + (1 — K)b, we also have 

K^{a-bf 
^ - IT- K)b • 

We will therefore think in terms of the three independent parameters: k (the relative size of IS*!); 
b (the average degree in the background); A (the signal-to-noise ratio). 

We generically find two solutions of the cavity recursion, that possibly coincide depending on 
the parameters values. This correspond to two distinct phases of the statistical mechanics models, 


and also have a useful algorithmic interpretation, which will be spelled out in detail in Section 3.3 


Initializing the recursion with the ‘exact solution’ of the reconstruction problem (‘plus’ ini¬ 
tialization), we converges to a ferromagnetic fixed point. This provides an upper bound on the 
performance of any reconstruction algorithm. Initializing the recursion with a completely oblivious 
initialization (‘free’ initialization), we converge to a paramagnetic fixed point. This also corresponds 
to the performance of the best possible local algorithm (see next section for a formal definition). A 
very similar qualitative picture is found in other inference problems on random graphs, one early 
example being the analysis of sparse graph codes [RU081IMM09] . An important simplification is 
that we do not expect replica-symmetry breaking in these models |Nis01L IMonOSj . 

Depending on the model parameters, we encounter two types of behaviors as A increases. 

• For large k or small b, the two fixed points mentioned above coincide for all A and no phase 
transition takes place. 


• For small k and large b, two phase transitions take place: a static phase transition at As(k, b) 
and a dynamic phase transition at a larger value Ad(K, b). In addition, a spinodal point occurs 
at Xsp{K,b) < Xs{K,b) < Xd{K,b). 

For A < Asp the two fixed point above coincide, and yield bad reconstruction. For A > Ad 
they coincide and yield good reconstruction. In the intermediate phase Agp < A < Ad, the 
two fixed points do not coincide. The relevant fixed point for Bayes-optimal reconstruction 
corresponds to the one of smaller free energy, and the transition between the two takes place 
at As. 

The reader might consult Fig. [^for an illustration. Also, a very similar phase diagram was obtained 
in the related problem of sparse principal component analysis in [DMI4al ILKZ15) . 


3.1 Cavity equations and population dynamics 


Fixing i, let P( • |f G S) (respectively F{-\i ^ S)) be the law of G subject to S containing (respec¬ 
tively -not containing) vertex i. Consider the random variable 


ei(G') = log 


Fjxi = 1|G) 

F{xi = 0|G) ■ 


(19) 
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(20) 


The likelihood ratio test (maximizing Psucc) amounts to choosing 

i;°P*(G)=l(^,(G)>log^). 

As N ^ oo, the distribution of ^i{G) under P( • |i G S) converges to the law of a certain random 
variable and the distribution of ^i{G) under P(- |i 0 S) converges instead to ^o- The cavity 
method allows to write fixed point equations for these limit distributions. We omit details of the 
derivation since they are straightforward given the model ( |11[ ), and since it is sufficient here to 
consider the replica-symmetric version of the method. General derivations can be found in |MM09L 
Chapter 14]. A closely related calculation is carried out in |DKMZlla| . which studies a more 
general random graph model, the so-called stochastic block model. 

The distribution of are fixed point of the following recursion (the symbol = means that 

the distributions of quantities on the two sides are equal) 

Loo Loi 

= h+Yl fiO + E ’ (21) 

i=l i=l 

^10 -^11 

^ ^ /({W) + . (22) 

2=1 2 = 1 


Here 4/i,i 


are independent copies of Further Lqo ~ Poisson((l — K)b), Lqi ~ Poisson(K6), 

Lio ~ Poisson((l— k)6), Lh ~ Poisson(Ka), are independent Poisson random variables, independent 
of the {Co*]}- Finally, 


h = logy = -K(a -b) - log (—^) > 


and the function / : 


is given by 


f(0 = log 1^- 


1 + pe^ 

+ 


(23) 


(24) 


(Recall that p = a/b, cf. Eq. 0.) 

The cavity method predicts that the asymptotic distribution of ^i(G) (conditional to i G S 


or z 0 5) is a fixed point of Eqs. (21), (22). In order to find the fixed points, we iterate these 


distributional equations with two types of initial conditions (that correspond, respectively, to the 
poor reconstruction and good reconstruction phases) 


free 


plus 


= log(K/(l - k)) , 
= log(K/(l - k)) , 

= +00. 


(25) 

(26) 


We refer to Section 3.3 for the interpretation and monotonicity properties of these conditions: in 
particular it can be proved that converge in distribution if initialized in this manner. We 


^Another natural choice would be to minimize P(Ti(G) 7 ^ Xi). This is achieved by setting Ti{G) = — 0 ). 















Figure 1: The success probability in the two different phases, for k = 0.005 (left), 0.020 (right) and 
b = 100 (corresponding to average degree outside the set 5, deg^^t = 100). Red curves correspond 
Psucc(fr) (i.e. free boundary/initial conditions), and provide to the optimal performance of local 
algorithms. Blue curves yield Psucc(pl) (he. plus boundary/initial conditions) and yield an upper 
bound on the performance of any algorithm. The continuous black line at Ag ~ 0.3 coincides with 
the phase transition of Bayes-optimal estimation. These curves were computed by averaging over 
10 runs of the population dynamics algorithm with M = 10^ samples and 300 iterations. 


implemented Eqs. (21), (22) numerically using the ‘population dynamics’ methoc^of [MPOlj (also 
known as ‘sampled density evolution’ |RU08l iMMOOj h 

In Figure [T| we plot the predicted behavior of Psucc for b = 100 and two different values of the 
clique size: k G {0.005,0.020}. The success probability is predicted to be (for N 


oo 


p — 

A siirr — 


> 0 ) + < 0 ) - 1 . 


(27) 


We denote by Psucc(fr) and Psucc(pl) the predictions obtained with the two initializations above. 

As anticipated two behaviors can be observed. For k sufficiently large, the curves Psucc(fi’) and 
Psucc(pl) coincide for all A. When this happens, this is also the success probability of the optimal 
likelihood ratio test and the latter can be effectively approximated using a local algorithm (e.g. 
belief propagation), see Section [3.3[ For k small the two curves remain distinct in an intermediate 
interval of values: A G (Asp, Ad). 

In this regime, the asymptotic behavior of the Bayes-optimal test is captured by the fixed point 
that yields the lowest free energy. It is convenient to define the rescaled free energy density as 
follows (assuming that the limit exists) 

ijj = — (alogy— 2a + 2b'] —log(l — k) — lim —ElogZ(G). (28) 

2 V 0 / n^cx] n 

The reason for this choice of the additive constants is that the resulting free energy is also equal 

^As a technical parenthesis, we found it useful to impose the constraint E(a:i) = k in the sampled density evolution. 
This was done using the method of [DMUOd] . 


9 























(29) 


to the asymptotic mutual information between the hidden set S and the observed graph G 

v>= hm ^ms). 
n^oo iV 

This quantity has therefore an immediate interpretation and several useful properties. 

The replica symmetric cavity method (equivalently, Bethe-Peierls approximation) predicts 


•0 = min T(Po,Pi), 
PoTi 


(30) 


where the supremum is over all probability distributions Po,Pi over the real line satisfying the 

(31) 


following symmetry property (see Section 3.3 for further clarification on this property): 


dPi 


dPf 


(0 = 


1 — K 


A 


K 


The functional T is defined as follows 


\I/® = 


2 

= E 


^ {K^a + (1 - K^)b) Elog |l + ■ a 

2^ ^ i + i + e«-2.2 /’ 


log I 


1 — K + ne 


—K,{a—b) 


(1 + e^“’i’i)(l + e^"=2.2)). 

/I + /I + 'i 

in, i + e«o,. ) 11 I l + e«i.^ // 


2 = 1 


j = l 


To = — (o log - — 2o + 26 


Here expectation is taken with respect to the following independent random variables: 
• {Co,*} that are i.i.d. random variables with distribution Pq; 


(32) 

(33) 

(34) 

(35) 


• {Cl,*} that are i.i.d. random variables with distribution Pi; 

• (xi,X 2 ) G {0,1}^ with joint distribution pi^i = K^ajz, po,i = Pi,o = «^(1 ~ K)blz, piq = 
(1 — K)‘^blz, where z = K^a + (1 — K?‘)b. 

• (Lq, Ti) with the following mixture distribution. With probability k: Lq ~ Poisson((l — k) 6), 
Li ~ Poisson(Ka). With probability (1 — k): Lq ~ Poisson((l — k)6), Li ~ Poisson(«:6). 

Let Pq'^^ and Pq^^ the distributions of the fixed points obtained with plus and free initial conditions. 
In Figure]^ we plot the minimum of the corresponding Bethe free energies 'I'(pl) = T(Pg ,P^') and 
T(fr) = T(Pq,P]') for 6 = 100, k = 0.005 (as obtained by the population dynamics algorithm). 
This is the cavity prediction for the free energy density if:. The value of A for which T(pl) = T(fr) 
corresponds to the phase transition point Ag between paramagnetic and ferromagnetic phases. From 
the reconstruction point of view, this is the phase transition for Bayes-optimal estimation: 


lim PiSlI”!*) 

N^oo 


Psucc(f^) for A <C Ag, 
Psucc(pl) for A > As. 


(36) 


Notice from Figure]^ that as expected ijj = limjv->-oo 1(1^; 5')/A^ is monotone increasing in the 
signal-to-noise ratio A, with ?/)—)• 0 as A —)■ 0, and ip ^ H{k) as A — )• oo (here H{k) = —nlogn — 
{1 — k) log(l — k) is the entropy of a Bernoulli random variable with mean k). Also, the curve Fig. 
presents some ‘wiggles’ at large k that are due to the limited numerical accuracy of the population 
dynamics algorithm. 
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0.035 



Figure 2: The free energy density (equivalently, the mutual information per vertex), for k = 0.005. 
The horizontal line corresponds to the maximal mutual information H{k) ~ 0.03148., The vertical 
line at Ag ~ 0.3 corresponds to the phase transition of the Bayes-optimal estimator. This curve was 
computed by averaging over 10 runs of the population dynamics algorithm with M = 10^ samples 
and 300 iterations. 



Asp 

As 

Ad 


0.01 -1-U -1- 1 -1-1 

0.0 0,1 0.2 0,3 0,4 0,5 0,6 

A=«^(a-b)V(l-«)6 




Figure 3: Limit a, 6 —)> oo with A and k fixed. Here k = 0.01. Left frame; Success probability for 
free boundary condition (equivalently, local algorithms, red curve), and plus boundary condition 
(equivalently, general upper bound, blue curve). Center frame: free energy (equivalently, mutual 
information per vertex) with same boundary conditions. Right frame: zoom of the free energy 
curves. 


3.2 Large-degree asymptotics 


In the previous section we solved numerically the distributional equations (21), (22). This approach 
is somewhat laborious and its accuracy is limited. Asymptotic expansions provide complementary 
analytical insights into the solution of these equations. 

Here we consider o, 6 —?■ oo with k fixed, and (a — b)/b‘^ converging to a limit. In particular, the 
signal-to-noise ratio A is also a constant. Let us emphasize once more that these limits are taken 
after N ^ oo and hence the graph is still sparse. 


11 
























Figure 4: Limit a, 6 —)• oo with A and k fixed. Here we plot the (shifted) free energy function 
4'(//) — 4^(0) for K = 0.01 and A G {0.16,0.19,0.22}. Comparing with Figure]^ we see that 0.16 < 
-^spC^) ^ -^sp(^) ^ 0.19 < As(k), Asp(k) < As(k) < 0.22 < Acj(k). 


In this limit, the fixed points of Eqs. (21), (22) take the form 

1 


Further fi satisfies the fixed point equation 

fi = X F(/i; k) 


where the function F( 


is defined by 

F(/x;k) = e|- 

^ 1 


1 — K 


+ (1 — K)e 




(37) 

(38) 

(39) 

(40) 


with expectation being taken with respect to Z ~ N(0,1). In other words, the distributional 


equations (21), (22) reduced to a single nonlinear equation for the scalar /r. Large /r correspond to 


accurate recovery. 

More formally, we expect the distributional solntions of Eqs. (21), (22) to converge to solutions 
of Eqs. 


^^ to ( |40| ). We do not provide a ‘physicists’ derivation of this statement since this follows 
heuristically|^from Lemma 4.4, The latter establishes that, iterating the cavity equations Eqs. (21), 
(22) any fixed number of times t is equivalent (in the large-degree limit) to iterating Eqs. (37) to 
(40). 


^Of course Lemma 4.4 does not prove rigorously that the fixed points of Eqs. (211, (221 converge to fixed points 


of Eqs. (37l to (40l. A complete proof would require controlling the convergence rate to fixed points. However in 


heuristic statistical physics derivation this is typically not done. Also, the proof of Lemma |4.4| follows the same 
strategy that would be employed in a heuristic derivation. 
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Figure 5: Phase diagram of the hidden subgraph problem in the large degree limit a,b ^ oo, with 
K = E|5|/A'' (relative size of the hidden set) and A = K^{a — 6)^/((l — K,)b) (signal-to-noise ratio) 
hxed. The three curves are, from top to bottom Ad(fi;), As(k) and Asp(k;). 


The free energy (32) becomes a function of ^ (we still denote it by T with a slight abuse of 
notation): 

- Elog |l - K + K exp + idX^ I , (41) 




where expectation is with respect to independent random variables X ~ Bernoulli(K) and Z 


N(0,1). Its local minima are solutions of Eq. (39). 


Equation (39) can be easily solved numerically, yielding the phase diagram in Figure]^ As 
before, we obtain phase transitions Asp(fi:) < \s{k) < Xdi^) as long as k is below a critical point 
K < Kif. The critical point location is 


K* « 0.04139. 


A* pa 0.5176. 


(42) 


The free energy 'I'(;u) has two local minima for k < k*, A G (Asp(k), Ad(«:)), and one local 

minimum otherwise. The local minimum is the global minimum for A > As(k), while is the 
global minimum for A < As(k). We refer to Figures]^ andfor illustration. 

Of particular interest is the case of small hidden subsets, i.e. the limit k —?• 0 (note that [S'! is 


still linear in N). For small k we have limK-s-o F(/r; k) = F(|U; 0) = e^. Hence the solutions (39) that 
stay bounded converges to the solution of 

H = \ e^ . (43) 

This equation has two solutions for A < 1/e and no solution for A > 1/e. This implies that 

1 


lim Ad(«:) = 


k ;^0 


(44) 


which is the result announced in Eq. ([^. It is also easy to see that As(k), Asp(k) —)> 0 as k —>■ 0. 
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3.3 Algorithmic interpretation 


The distributional equations (21) and (22) define a sequence of probability distributions indexed 
by t G {0,1)2,...}. More precisely, for every t the recursion defines the probability distributions 


and (the distribution of When specialized to the free/plus 
(25), (p5^, these probability distributions have a simple and useful 




(the distribution of 
initial conditions (cf. Eqs. 
interpretation that we will now explair^' 

Define to be the ball of radius t centered at i G V, in graph G. Namely, this is the 

subset of vertices of G whose distance from i is at most t. By a slight abuse of notation, this will also 
denote the subgraph induced in G by those vertices. The following remarks are straightforward. 


Free boundary condition. Consider the optimal test Ti{G) among those that only use local 
information. In other words, Ti(G) is the optimal test that is a function of Bt{G,i). This is again 
a likelihood ratio test. Concretely, we can define the log-likelihood ratio 


^i{G;t) = log 


P(xj = l|Bt(G,i)) 
P(x, = 0|Bt(G,i)) • 


(45) 


Then the optimal test takes the form Ti{G) = I{^i{G) > log[K/(l — k)]) (if we are interested in 
maximizing Pi^)J;c(T)) or Ti{G) = I(^i(G) > 0) (if we are interested in minimizing the expected 
number of incorrectly assigned vertices). 

fr 

Fixing the depth parameter t, the distribution of ^i{G]t) converges (as N —)■ oo) to Pq for 

(i'') fr 

i G S, and to P) ’ for i ^ S. Mathematically, for any fixed i 


Ci{G‘, t) under P( • |i G S'), 

^i{G; t) 4> under P( • |z 0 S'). 


(46) 

(47) 


In particular, for any fixed t, the success probability 

Psucc(i; fr) = Pq < log[«:/(1 - k)]^ > log[K/(1 - k)]^ - 1 (48) 

is the maximum asymptotic success probability achieved by any test that is t-local (in the sense 
of being a function of depth-t neighborhoods). It follows immediately from the definition that 
Psucc(i; fr) is monotone increasing in t. Its t —)• oo limit Psucc(fr) is the maximum success probability 
achieved by any local algorithm. This quantity was computed through population dynamics in the 
previous section, see Figure [T] 

Plus boundary condition. Let Bt(G, i) be the complement of Bt{G,i), i.e. the set of vertices of 
G that have distance at least t from i. Then the interpretation of being the log-likelihood 

ratio, when information is revealed about the labels of vertices in Bt-i{G,i). Namely, if we define 


C'iG-,t)^log 


¥{xi — l\Bt{G,i),XQ^^Q^^) 
F{xi = 0|B4(G,i),Xg^(g,_.)) ’ 


(49) 


'^The discussion follows very closely what happens in other inference problem, for instance in the analysis of sparse 
graph codes | MM091 IRU08| . 


14 











then we have 



under E( • 

G S) , 

(50) 


under E( • 

|i 0 S). 

(51) 


In particular, Psucc(pl) is an upper bound on the performance of any estimator. In the previous 
section we computed this quantity numerically through population dynamics. 


Let us hnally comment on the relation (31) between Pq and Pi. This is an elementary conse¬ 
quence of Bayes formula: consequences of this relation have been useful in statistical physics under 
the name of ‘Nishimori property’ [NisDl] . It is also known in coding theory as ‘symmetry condition’ 
|BUn8| . Consider the general setting of two random variables X, Y, with X G {0,1}, P(X = !) = «:, 
and let ^{Y) = log[P(X = 1\Y)/¥{X = 0|y)]. Then for any interval A (with non-zero probability), 
applying Bayes formula. 


p(^(y) G = 1) 


¥{i{Y)eA-x = i) 

P(X = 1) 

-E{l(e(y) G A)F{X = l|y)} = -E{l(e(y) G A)I(X = 

/v Kj 

E{l(^(y) G A) = 0} , 


(52) 

(53) 

(54) 


which is the claimed property. 


4 Rigorous results 


In the previous section we relied on the non-rigorous cavity method from spin glass theory to 
derive the phase diagram. Most notably we used numerical methods, and formal large-degree 
asymptotics to study the distributional equations (21), (22). Here we will establish rigorously some 
key implications of the phase diagram, namely: 


• By exhaustive search over all subsets of k vertices in G, we can estimate S accurately for any 
A > 0 and k small. 

• Local algorithms succeed in reconstructing accurately S' if A > 1/e, and fail for A < 1/e 
(assuming large degrees and k small). 


4.1 Exhaustive search 

Given a set of vertices R C [A^], we let E{R) denote the number of edges with both endpoints in R. 
Exhaustive search maximizes this quantity among all the sets that have the ‘right size.’ Namely, it 
outputs 


S = arg max | E(R) : |i?| 
RC[N] I 



(55) 


(If multiple maximizers exist, one of them is selected arbitrarily.) We can also dehne a test function 
Ti{G) by letting T?^{G) = 1 for i G S and T^^{G) = 0 otherwise. Note that, for nn growing with 
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n, this algorithm is non-polynomial and hence cannot be used in practice. It provides however a 
useful benchmark.. 

We have the following result showing that exhaustive search reconstructs S accurately, for any 
constant A and k small. We refer to Section for a proof. 

Proposition 4.1. Let Pgucc = hm sup^_,.oo be the asymptotic success probability of exhaus¬ 

tive search and assume n < 1/2. Then 


p<==' > 1 - 

^ SUCC — 


2e 


exp 


A(1 — K)b 

16 Ka 


(56) 


In particular, we have the following large degree asymptotics as a,b ^ oo with A, k fixed 

Psucc(^ = oo) = lim inf P“,^ > 1 - ^ exp ( - , (57) 

a,b^oo y K V 16 K / 

and Psucc(^ = cc) —)• 1 as K ^ 0 for any A > 0 fixed. 


4.2 Local algorithms 

We next give a formal definition of t-local algorithms. Let Q^, is the space of unlabeled rooted 
graphs, i.e. the space of graphs with one distinguished vertex (see -for instance- |Monl5| for more 
details). Formally, an estimator Ti{G) for the hidden set problem is a function {G,i) i—)• T{G‘,i) = 
Ti{G) £ {0,1}. Since the pair (G,i) is indeed a graph with one distinguished vertex (and the 
vertices labels clearly do not matter), we can view T as a function on Q^,: 


{0,1} . (58) 

The following definition formalizes the discussion in Section [3.3| (where the definition of Bt(G, i) is 
also given). The key fact about this definition is that t (the ‘locality radius’) is kept fixed, while 
the graph size can be arbitrarily large. 

Definition 4.2. Given a non-negative integer t, we say that a test T is t-local if there exists a 
function T -.Q^ ^ {0,1} such that, for all {G, i) £ 

T,{G)=F{^t{G,i)). (59) 


We say that a test is local, if it is t-local for some fixed t. 

We denote by Loc(t) and Loc = Ut>oLoc(t) the sets of t-local and local tests. 


The next lemma is a well- known fact that we nevertheless state explicitly to formalize some 
of the remarks of Section 3.3 


Recall that Pi^c(T) denotes the success probability of test T, as 
per Eq. (14), and let Psucc(Lfr) be defined as in Eq. (48), with the laws of random 

variables^^^’*, 


Lemma 4.3. We have 


sup lim P((/2(P) = Psucc(t;fr) • (60) 

TeLoc(t) 
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In particular 


sup lim = Psucc(fr) 

TeLoc 


lim Psucc(i;fr) • 

t^OO 


(61) 


Further, the maximal local success probability Psucc(i;fr) can be achieved using belief propagation 


with respect to the graphical model (11) in 0{t\E\) time. 


We will therefore valuate the fundamental limits of local algorithms by analyzing the quantity 
Psucc(fr)- The following theorem establishes a phase transition for this quantity at A = 1/e. 


Theorem 1. Consider the hidden set problem with parameters a, b, k, and let A = K^{a — b)'^/{I — 
K)b. Then: 

(a). //A < 1/e, then all local algorithms have success probability uniformly bounded away from 
one. In particular, letting x*(A) < e to be the smallest positive solution of x = e^^, we have 

sup lim Pi^ec(T) = Psucc(fr) < ^ . (62) 

4 4 


(b). If X > l/e, then local algorithms can have success probability arbitrarily close to one. In 
particular, considering the large degree asymptotics a, 5 —>■ oo with k, A fixed 

lim inf Psucc(fr) = Ps”lf''®(fr; A), (63) 

a,6^00 

we have 


limPi-f'=g(fr;K,A) = l. 




(64) 


As a useful technical tool in proving part (6) of this theorem, we establish a normal approxi¬ 
mation result in the spirit of Eqs. (37), ( |38| ). In order to state this result, we recall the definition 
of Wasserstein distance of order 2, W 2 {p-,^) between two probability measures p, v on M, with 
finite second moment f x^iy(dx) < oo, f x^p(dx) < oo. Namely, denoting by C{u,p) the family of 
coupling^ of p and u, we have 


W 2 {iy,p)^\ inf \x-y\‘^-f{dx,dy)\ 


^ 1/2 


(65) 


Given a sequence of probability measures {r'njneN with finite second moment, we write Vn ^ v if 

W2{iyn,J^) 0 . 

Lemma 4.4. For t > 0, let be the random variables defined by the distributional recursion 

(21), (22), with initial condition (25), and denote by Po*^’^^ corresponding laws. Further 

let p^^^ be defined recursively by letting p^^^ = 0 and 


k) , where F(/r; k) = E | 


1 — K 


K + {I - k) exp{-p/2 -F ,/pZ} j ’ 


N( 0 , 1 ). 


( 66 ) 


^Explicitly, £ C{y, p) if it is a probability distribution on RxR such that f 7(4, dy) = v{A) and f 7 (da;, 4) = p(4) 
for all 4. 
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Then, considering the limit a, 6 —>■ oo with k fixed and — hfi /((1 — k) 6) —>■ A G (0, oo), we have 


pW..'nN(-log + i ^lO). 


The proof of this lemma is presented in Section B.l 


(67) 

( 68 ) 


5 Discussion and related work 

As mentioned in the introduction, the problem of identifying a highly connected subgraph in an 
otherwise random graph has been studied across multiple communities. Within statistical theory, 
Arias-Castro and Verzelen [ACV14L IVAC13j established necessary and sufficient conditions for 
distinguishing a purely random graph, from one with a hidden community. With the scaling adopted 
in our paper, this ‘hypothesis testing’ problem requires to distinguish between the following two 
hypotheses: 


Hq : Each edge is present independently with probability h/N, 

Hi : Edges within the community are present with probability a/N. 

Other edges are present with probability h/N. 

Note that this problem is trivial in the present regime and can be solved -for instance- by counting 
the number of edges in G. 

The sparse graph regime studied in the present paper was also recently considered in a series of 
papers that analyzes community detection problems using ideas from statistical physics [DKMZllbl 
IDKMZllal . The focus of these papers is on a setting whereby the graph G contains k > 2 
non-overlapping communities, each of equal size N/k. Using our notation, vertices within the same 
community are connected with probability a/N and vertices belonging to different communities 
are connected with probability b/N. Interestingly, the results of |DKMZlla] point at a similar 
phenomenon as the one studied here for k > 5. Namely, for a range of parameters the community 
structure can be identified by exhaustive search, but low complexity algorithms appear to fail. 

Let us mention that the very same phase transition structure arises in other inference prob¬ 
lem, for instance in decoding sparse graph error correcting codes, or solving planted constraint 
satisfaction problems |RU08L IMM09L IART06[ IZKllj . A unified formalism for all of these prob¬ 
lems is adopted in |AM13| . All of these problems present a regime of model parameters whereby 
a large gap separates the optimal estimation accuracy, from the optimal accuracy achieved by 
known polynomial time algorithms. Establishing that such a gap cannot be closed under standard 
complexity-theoretic assumptions is an outstanding challenge. (See |HWX14] for partial evidence 
in this direction -albeit in a different regime.) One can nevertheless gain useful insight by studying 
classes of algorithms with increasing sophistication. 

Local algorithms are a natural starting point for sparse graph problems. The problem of finding 
a large independent set in a sparse random graph is closely related to the one studied here. 
Indeed an independent set can be viewed as a subset of vertices that is ‘less-connected’ than 
the background (indeed is a subset of vertices such that the induced subgraph has no edge). 
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The largest independent set in a uniformly random regular graph with N vertices of degree 
d has typical size a{d) N + o{N) where, for large bounded degree d, a{d) = 2d~^ logd{l + 
Od(l)). Hatami, Lovasz and Szegedy |HLS12| conjectured that local algorithms can find 
independent sets of almost maximum size -up to sublinear terms in N. Gamarnik and Sudan 
|GS14] recently disproved this conjectured and demonstrated a constant multiplicative gap 
for local algorithms. Roughly speaking, for large degrees no local algorithm can produce an 
independent set of size larger than 86% of the optimum. This factor of 86% was later improved 
by Rahman and Virag |RV 14j to 50%. This gap is analogous to the gap in estimation error 
established in the present paper. We refer to |GHH14] for a broader review of this line of 
work. 

As mentioned before, belief propagation (when run for an arbitrary fixed number of iterations) 
is a special type of local algorithm. Further it is basically optimal (among local algorithms) 
for Bayes estimation on locally tree like graphs. The gap between belief propagation decoding 
and optimal decoding is well studied in the context of coding [R4Jr)8( IMMOQ) . 

Spectral algorithms. Let be the adjacency matrix of the graph Gj\f (for simplicity we set 
(Aivjii rsj Bernoulli(o/A') for i £ 5, and (AAr)n ~ Bernoulli(6/A^) for i 0 S). We then have 

E{Ajv|5'} = -IslJ H— ll'*' • (69) 

n n 

This suggests that the principal eigenvector of (A^r — {h/n)\\^) should be localized on the set 
S. Indeed this approach succeeds in the dense case (degree of order n), allowing to reconstruct 
S with high probability |AKS98| . 

In the sparse graph setting considered here, the approach fails because the operator norm 
IIAat — E{AAr|5}||2 is unbounded as —>■ oo. Concretely, the sparse graph Gn has large 
eigenvalues of order y^log N/ log log N localized on the vertices of largest degree. This point 
was already discussed in several related problems |FO05l ICOlOl IKMOlOl IMNSlSj . 

Several techniques have been proposed to address this problem, the crudest one being to 
remove high-degree vertices. 

We do not expect spectral techniques to overcome the limitations of local algorithms in the 
present problem, even in their advanced forms that take into account degree heterogeneity. 
Evidence for this claim is provided by studying the dense graph case, in which degree hetero¬ 
geneity does not pose problems. In that case spectral techniques are known to fail for A < 1 
|DM14bl IMRZ14] . and hence are strictly inferior to (local) message passing algorithms that 
succeec^for any A > 1/e. 

Semidefinite relaxations. Convex relaxations provide a natural class of polynomial time algo¬ 
rithms that are more powerful than spectral approaches. Feige and Krauthgamer |FK00l 
IFKn3| studied the Lovasz-Schrijver hierarchy of semidefinite programming (SDP) relaxations 
for the hidden clique problem. In that setting, each round of the hierarchy yields a constant 
factor improvement in clique size, at the price of increasing complexity. It would be interest¬ 
ing to extend their analysis to the sparse regime. It is unclear whether SDP hierarchies are 
more powerful than simple local algorithms in this case. 

®Note that the definition of A in the present paper correspond to A^ in |DM14b[IMR,Z14] . 
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Let us finally mention that the probability measure (11) can be interpreted as the Boltzmann 
distribution for a system of kN particles on the graph G, with fugacity 7 , and interacting attrac¬ 
tively (for p > 1). Statistical mechanics analogies were previously exploited in [ISS071 [GSSVllj . 
(See also [HRN12] for the general community detection problem.) 
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A 


Proof of Proposition 


4.1 


For the sake of simplicity, we shall assume a slightly modified model whereby the hidden set S is 
uniformly random with size IS"! = k, with k/N —>■ k. Recall that, under the independent model 
@ |5| Binom(n, k) and hence is tightly concentrated around its mean nn. Hence, the result the 
independent model follows by a simple conditioning argument. 

Let L = IS* n 5|. By exchangeability of the graph vertices, we have 


p(A'),ex 

succ 


= P(Ti(G) = l|i e 5) + F{Ti{G) = 0\i^S)-l 
(L N — 2k + L 


k 

= Ei^- 


N -k 
k-L 


- 1 


} 


\k N -k^ - \ k r 


(70) 

(71) 

(72) 


where the last inequality follows since, without loss of generality, N — k > k. Setting x* = 
[ej^/ k) exp ( — A(1 — k)6/(16 ko)) , we will prove that for any <5 > 0 there exists c(5) > 0 such that 


< k{\-x,-b)) < 


(73) 


The claim the follows by using the inequality (72) together with the fact that {k — L)/k < 1. 

For two sets A, B C V = [A"], we let E{A,B) the number of edges (i,j) G E such that 
{i,j} ^ 4, but {i,j} 2 B. In order to prove Eq. ( [TS] ) note that, for G {0,1,..., /c} 

¥{L = £)< F{3R C R : |R| = A:, |R n = £, E{R, S) > E{S, R)) . (74) 


To see this notice that, by definition, if L = £ then [S' H S| = £. This mean that there must exists 
at least one set R C [n] satisfying the following conditions: 


• |R| = k. 

• \RnS\=£. 

• E{R) > E{S). 
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Indeed S is such a set. This immediately implies Eq. (74) by noticing that E{S, R) = E{S) — E{Sr\ 
R) and E{R, S) = E{R) — E{S n R). By a union bound (setting m = ( 2 ) — ( 2 )): 

m 

^{L = l) < C S,R2CV\S : \Ri\=i, \R2\ = k-i, E{S,Ri) <j, E{RiUR2,S) < j) 

j=0 

(75) 


sE 

j=0 


k\ fN -k 


£ \k-£ 


P(Binom(m; a/A^) < j) P(Binom(m; 6/A^) > j) . 


(76) 


In the last inequality we used union bound and the fact that edges contributing to E{S,Ri) and 
E{Ri U R 2 ,S) are independent. Using Chernoff bound on the tail of binomial random variables 
(with D{q\\p) = qlog{q/p) + (1 — g) log((l — q)/{l — p)) the Kullback-Leibler divergence between 
two Bernoulli random variables), we get 

'k\ fN — k\ 

P(Binom(m; a/77) < j) P(Binom(m; 6/A^) > j) 


F{L = £) < {m + 1) 


max 

£ J \ k — £ J jG[bm/n,am/n]r]N 


< 


(m + l)Q^ exp I - m [D{j/m\\a/N) + D{j/m\\b/N)]^,. 


(77) 


(78) 

Here, the first inequality follows because both probabilities are increasing for j < bm/N and 
decreasing for j > am/N. We further note that, ^ > 1 + and therefore, for q,p ^ [0,1], 

D{q\\p)>^( -J-r + l)(g-p)^ (79) 

This implies that, for pi < P 2 , we have 

min rT)(x||pi) + T>(x||p 2)1 > f — + 1 ) min \{x - pi)‘^ + {x - p 2 )‘^] (80) 

a:e[pi,P2] 4 \p2 7 a;e[pi,P2] 


- i(s ■"*)<*’■-*’T' 


( 81 ) 


We substitute the last inequality in Eq. (78), together with the bounds (^) < min[(ea/6)^, (ea/(a — 

5 ))a-b] 


P(Z, = 8) <(»+!)(—) (—) «p{- 4(l + -)((^-(^) }■ 

We let £ = k{l — x) = kN{ 1 — x) whence 

'k^ 


m / N\ / a 6\2' 


m = 




> -(k-£) = 

2 ~ 2 ^ 2 


-X. 


We therefore get 


( P \ 2k,Nx c 

- \ 

xWk/ I 


Nx K^{a — hf 


X-y/Tt/ 


< (m + 1) 


X\/K 


exp 


A(1 — K)b\ \ 


16k a 


a 

2k,Nx 


i 


(82) 

(83) 

(84) 

(85) 
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For X > + 6, the argument in parenthesis is smaller than e '=((5)/(2Ka;) therefore 

P(L = £) < + ( 86 ) 

Summing over £ < k{l — x* — <5), we get P(LA:(1 — x* — 5)) < k{m + 1) which implies the 

claim (73), after eventually adjusting c((5), since k{m + 1) < N^. 


B Proof of Theorem [T] 

B.l Proof of Lemma 14.41 

Throughout this section we will drop the superscript fr from Cq/i'" ^o/V 

Recall that convergence in VF 2 distance is equivalent to weak convergence, plus convergence of 
the first two moments |Vil08l Theorem 6.9]. We will prove by the following by induction over t: 

I. The first moments E{|^q*^|}, E{|(^|*^|} are finite and we have 

II. The variances Var(^Q*^), Var(^®) are finite and they converge 

hm Var(cW)=^W^ (gg) 

a,0^00 

lim Var(^|^^) = //d) ^ (gg) 

a,b—^oo 

III. Weak convergence 

PW ^ n( - log (^) - \ /rW, , (gi) 

pfUN(-log(^)+^/rW, (g 2 ) 


lim E{4*^} 

a,0^00 

lim 

a.b—^cyo 


These claims obviously hold for t = 0. Next assuming that they hold up to iteration t, we need 
to prove them for iteration t + 1. For the sake of brevity, we will only present this calculation for 
Pq*'*'^\ since the derivation for is completely analogous. 


Let us start by considering Eq. (87). First notice that the absolute value of right-hand side of 
Eq. (21) is upped bounded by 


Lqo Lqi 

h + b72^(i + icgi) + c2^(i + iegi), 

i=l i=l 


(93) 
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and hence < oo follows from the induction hypothesis I(t) and the fact that Lqq,Lqi 

are Poisson. Next to prove Eq. (87), we take expectation of Eq. (21), and let, for simplicity, 
1{k) = log((l - k)/k): 


At) 

e?o 


= -^(k) - K{a - 6) + (1 - K)bElog ( 1 + (p - 1)- _ 

1 + e^o 




(94) 


+ K6Elog + (p - 1) 
= —1{k) — K(a — 6)+ 


+ (1 — K)(a — b)K 


(a — b)'^ 

- (1 - k )- --^E 

^ ^ 26 


At) 

e^i 


1 + e^i 


(t) 


(95) 



+ 0 


(« - bf 

62 


where the last equality follows from bounded convergence, since, for all a: G M, 0 < e^/(l + e^) < 1. 
Note that the laws of and satisfy the symmetry property ( |3l[ ). Hence, for any measurable 
function p : M —)> M such that the expectations below make sense, we have 


(1 - ^)Ep(eW) + KEgiC?) = ^E{(1 + e-«i‘') p(eP)} . 

In particular applying this identity to g{x) = e*/(1 + e^) and g{x) = [e*/(1 + e^)]^, we get 


(1 — k)E 


(1 — k)E 



= K . 


= kE 


At) 

e^i 


1 + e«i 


(0 


(96) 

(97) 

(98) 


Substituting in Eq. (|95|), and expressing a in terms of 6, k, A we get 

+ 0(6-i/2) 


= _ ll^E ^ 

^ ' 2k 


At) 

e^i 


1 + e^i 


(t) 


1 + exp {1{k) — p(d/2 + -y/pld Z}^ 


+ 06 ( 1 ), 


(99) 

( 100 ) 


where 06(1) denotes a quantity vanishing as 6 —)• oo. The last equality follows from induction 
hypothesis Ill(f) and the fact that g{x) = 1/(1 + e~^) is bounded continuous, with Z N(0,1). 
This yields the desired claim (87) after comparing with Eq. (66). 


Consider next Eq. (89). The upper bound on the right-hand side of Eq. (21) given by Eq. (93) 
immediately imply that Var(^g*'*'^^) < oo. In order to estabilish Eq. (89), we recall an elementary 
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formula for the variance of a Poisson sum. If L is a Poisson random variable and {Xj}j>i are i.i.d. 
with finite second moment, then 


Var(^Xi) =nL)E{Xl). 


i=l 


( 101 ) 


Applying this to Eq. (21), and expanding for large b thanks to the bounded convergence theorem, 
we get 

= (1 - k)6 E | [log ^1 + (p - 1) ^ ^ ^ ]^| + | [log + (p - 1) ^ ^ ^ ]^| 


{a — hY 

= (1 - k )-—^E 


oSO 


it) 


1 + e«i 


(t) 


+ K -—-^E 


0^1 


(t) 


1 + e«: 


(t) 


+ 0 ( 6 -^/ 2 ) 




At) 

e^i 


l + e«: 


(t) 




( 102 ) 

(103) 

(104) 


where the last equality follows by applying again Eq. (98). By using the induction hypothesis 111(6) 
and the fact that g{x) = (1 + e“*) is bounded Lipschitz, 


lim Var(P, 

a,b—^oo 




AE 


K 


1 + exp {1{k) — pW/2 + y/pW Z} j 


) =AF(pW;K) 


(105) 


which is Eq. (66). 


We finally consider Eq. (91). By subtracting the mean, we can rewrite Eq. (21) as 

Lqo Lqi 

^ W + + (Loo - ELoo)E/(eg) + (Loi - ELoi)E/(^g), (106) 

i=l i=l 

where Xi = — E/(^q*)), Yi = /(|f]) — E/(^[*]). Note that Xi, Yi have zero mean and, by the 

calculation above, they have variance E{X?} = K{Y^} = 0(1/6). Denoting the right hand side by 
Sb-- 


ELoo EI/oi 

Sb=Y.Xi+Yl^i + {Loo - ELoo)E/(eg) + (Lqi - ELoi)E/(^g) + op(l), (107) 

i=l i=l 


because (for instance) ~ ^ order -v/6 independent random variables 

with zero mean and variance of order 1 /6. Note that 


lim ELo,oVar(Ai) + lim ELo,iVar(yi) + lim Var(Lo,o)E/(^^*|) + lim Var(Lo,i)E/(^f[) 

a,6^00 a ^ b^oo a,6^00 ’ a,6^00 ’ 

= lim {(1 - k)6 E[/(^J[J)2] + K6E[/(^j[J)2]} = p(*+^), 
a,b^oo 

(108) 
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where the last equality follows by the calculation above. Hence, by applying the central limit 
theorem to each of the four terms in Eq. (107) and noting that they are independent, we conclude 
that Sb converges in distribution to 

B.2 Proof of Theorem (a) 

Define the event H = {.^ > log(K/(l — k))}, and write for Pq/Y"^- From Eq. (48) we have 


(t;fr) = pW(H'=) + Pi'^(H)-l 
= pW(H)-pW(H) 


it). 


= E 


W. 




(i) 


dP, 


< Eq 


(t) /dP- 


it) 


1/2 


dP, 


it) 


- 1 


pW(H)V2_pW(yi) 


< sup < Eq 
q>0 


- 




f( 

1/2 


dP, 


it) 


q-q 


dP 


it) 


dP, 


it) 


- 1 


(109) 

( 110 ) 

( 111 ) 

( 112 ) 

(113) 

(114) 


Using Eq. (31), and the fact that we get 

Psucc(t;fr) < J - 1^ 


(115) 


Call Xi = (1 — k)^k ^E{e^^o^}. By the initialization (25), xq = 1. Taking exponential moments of 
Eq. (pll), we get 


Xi+i = exp < —2Ka + (2k — 1)6 + (1 — k)6E 


'(l + pe^o^Y 

+ k6E 

'fl + pe^^^'Y' 


V l + e«o‘' ) 

V l + e€i‘’ ) 




Note that by Eq. (31), for any measurable function g : 
make sense, we have 


(116) 

such that the expectations below 


(1 - K)Eff(eW) + ^E5(d*^) = (1 - Ac) E{(1 + e«o )5(e?^)} . 
Applying this to g{x) = (1 + pe^)‘^l{l + e^)^, we get 


xt+i = exp < —Ina + (2k — 1)6 + (1 — k)6E 


(l + pe«®) 2 ' 

1 + e^o 


(117) 


(118) 
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Now we claim that, for 2 ; > 0, we have 


(1 + 

1 + z 


< l + (2p-l)z + (p-l)^z^ 


(119) 


This can be checked, for instance, by multiplying both sides by (1 + z) and simplifying. Using 
E{e^o ^} = k/(1 — k) and E{e^^o — k)^, we get 


xt+i < exp |-2Ka + {2k - 1)6+ (1 - k)6 (^1 + {2p - 1) ^ ^ ^ + {p - ^ a;*) | 


= 


Let xt be the solution of the above recursion with equality, i.e. xq = 1 and 


xt+i = 


( 120 ) 

( 121 ) 

( 122 ) 


It is a straightforward exercise to see that xt is monotone increasing in t and A. Further, for 
A < 1/e, limt_>.oo xt(A) = x*(A) the smallest positive solution of x = , and x*(A) < x*(l/e) = e. 

Hence xt <xt < x*(A) which, together with Eq. (115) finishes the proof. 

B.3 Proof of Theorem [^(6) 

Note that by monotonicity Psucc(fr) > Psucc(L h)) and hence it is sufficient to lower bound the limit 
of the latter quantity. By Lemma |4.4[ we have 


lim Psucc(i;fr) = 1 - 2$( - A//xh)/2) , 
1,6^00 \ '' / 


(123) 


where <I>(x) = is the Gaussian distribution, and is defined recursively by 

(124) 


Eq. (66) with = 0. Hence for all t > 0 

limPi-f'=*^(fr;K,A)>lim{] 

K —>-0 K —^0 K 


1 - 24> - a/uW/2 


It is therefore sufficient to prove that 


lim lim = 00 . 

t —^00 K —^0 


Now by monotone convergence, we have 


lim F(/r; k) = ^ g/. ^ 

K —>-0 


(125) 


(126) 


Further f{p; k) increases monotonically towards its limit as k —)> 0. Furthermore, f{p] k) is increas¬ 
ing in p for any fixed k > 0. By induction over t we prove that limK- 5.0 (the limit being 

monotone from below), where p^^'^ = 0 and for all t > 0 


jjit+l) ^ ^gM 


(t) 


(127) 
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In order to prove this claim, note that the base case of the induction is trivial and (writing explicitly 
the dependence on n 


. (128) 

On the other hand for a fixed kq > 0 

lim > A lim F(^^*^(ko); «) = . (129) 

K —^0 K —>-0 


The claim follows since kq can be t aken arbitrarily small. 

Now it is easy to show from Eq. (127) that limt_>.oo = oo for A > 1/e (this is is indeed closely 
related to the sequence xt constructed in the previous section, since xt = exp(jl^^^)). 


References 

[ACV14] Ery Arias-Castro and Nicolas Verzelen, Community detection in dense random net¬ 
works, The Annals of Statistics 42 (2014), no. 3, 940-969. 

[AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov, Finding a large hidden clique in 
a random graph, Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms. Society for Industrial and Applied Mathematics, 1998, pp. 594-598. 

[AM13] Emmanuel Abbe and Andrea Montanari, Conditional random fields, planted constraint 
satisfaction and entropy concentration. Approximation, Randomization, and Combi¬ 
natorial Optimization. Algorithms and Techniques, Springer, 2013, pp. 332-346. 

[ART06] Dimitris Achlioptas and Eederico Ricci-Tersenghi, On the solution-space geometry of 
random constraint satisfaction problems. Proceedings of the thirty-eighth annual ACM 
symposium on Theory of computing, ACM, 2006, pp. 130-139. 

[COlO] Amin Coja-Oghlan, Craph partitioning via adaptive spectral techniques. Combina¬ 
torics, Probability and Computing 19 (2010), no. 02, 227-284. 

[DGGP14] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres, Finding hidden cliques in linear time 
with high probability. Combinatorics, Probability and Computing 23 (2014), no. 01, 
29-49. 

[DKMZlla] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborova, Asymp¬ 
totic analysis of the stochastic block model for modular networks and its algorithmic 
applications. Physical Review E 84 (2011), no. 6, 066106. 

[DKMZllb] _, Inference and phase transitions in the detection of modules in sparse networks. 

Physical Review Letters 107 (2011), no. 6, 065701. 

[DM14a] Y. Deshpande and A. Montanari, Information-theoretically optimal sparse PCA, Infor¬ 
mation Theory (ISIT), 2014 IEEE International Symposium on, June 2014, pp. 2197- 
2201. 

[DM14b] Yash Deshpande and Andrea Montanari, Finding hidden cliques of size -s/N/e in 
nearly linear time. Foundations of Computational Mathematics (2014), 1-60. 


27 





[DMU04] 

[FKOO] 

[FK03] 

[FO05] 

[ForlO] 

[GHH14] 

[GM75] 

[GS14] 

[GSSVll] 

[Has96] 

[HLS12] 

[HRN12] 

[HWX14] 

[ISS07] 

[Jer92] 


Changyan Di, Andrea Montanari, and Rudiger Urbanke, Weight distributions of Idpc 
code ensembles: combinatorics meets statistical physics, IEEE International Sympo¬ 
sium on Information Theory, 2004. 

Uriel Eeige and Robert Krauthgamer, Finding and certifying a large hidden clique in 
a semirandom graph, Random Structures and Algorithms 16 (2000), no. 2, 195-208. 

_, The probable value of the lovdsz-schrijver relaxations for maximum indepen¬ 
dent set, SIAM Journal on Computing 32 (2003), no. 2, 345-370. 

Uriel Feige and Eran Ofek, Spectral techniques applied to sparse random graphs. Ran¬ 
dom Structures & Algorithms 27 (2005), no. 2, 251-275. 

Santo Fortunato, Community detection in graphs. Physics Reports 486 (2010), no. 3, 
75-174. 

David Gamarnik, Mathieu Hemery, and Samuel Hetterich, Local algorithms for graphs, 
arXiv:1409.5214 (2014). 

Geoffrey R Grimmett and Colin JH McDiarmid, On colouring random graphs. Mathe¬ 
matical Proceedings of the Cambridge Philosophical Society, vol. 77, Cambridge Univ 
Press, 1975, pp. 313-324. 

David Gamarnik and Madhu Sudan, Limits of local algorithms over sparse random 
graphs. Proceedings of the 5th conference on Innovations in theoretical computer sci¬ 
ence, ACM, 2014, pp. 369-376. 

Alexandre Gaudilliere, Benedetto Scoppola, Elisabetta Scoppola, and Massimiliano 
Viale, Phase transitions for the cavity approach to the clique problem on random 
graphs. Journal of Statistical Physics 145 (2011), no. 5, 1127-1155. 

Johan Hastad, Clique is hard to approximate within n l-&epsiv. Foundations of Com¬ 
puter Science, 1996. Proceedings., 37th Annual Symposium on, IEEE, 1996, pp. 627- 
636. 

Hamed Hatami, Laszlo Lovasz, and Balazs Szegedy, Limits of local-global convergent 
graph sequences, arXiv:1205.4356 (2012). 

Dandan Hu, Peter Ronhovde, and Zohar Nussinov, Phase transitions in random potts 
systems and the community detection problem: spin-glass type and dynamic perspec¬ 
tives, Philosophical Magazine 92 (2012), no. 4, 406-445. 

Bruce Hajek, Yihong Wu, and Jiaming Xu, Computational lower bounds for commu¬ 
nity detection on random graphs, arXiv: 1406.6625 (2014). 

Antonio lovanella, Benedetto Scoppola, and Elisabetta Scoppola, Some spin glass 
ideas applied to the clique problem, Journal of Statistical Physics 126 (2007), no. 4-5, 
895-915. 

Mark Jerrum, Large cliques elude the metropolis process, Random Structures &: Algo¬ 
rithms 3 (1992), no. 4, 347-359. 


28 





[KhoOl] 

Subhash Khot, Improved inapproximability results for maxclique, chromatic number 
and approximate graph coloring^ Foundations of Computer Science, 2001. Proceedings. 
42nd IEEE Symposium on, IEEE, 2001, pp. 600-609. 

[KMM+13] 

Elorent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka 
Zdeborova, and Pan Zhang, Spectral redemption in clustering sparse networks, Pro¬ 
ceedings of the National Academy of Sciences 110 (2013), no. 52, 20935-20940. 

[KMOlO] 

Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh, Matrix completion 
from a few entries, Information Theory, IEEE Transactions on 56 (2010), no. 6, 2980- 
2998. 

[LC98] 

EL Lehmann and George Casella, Theory of point estimation, 2 ed., Springer, 1998. 

[LKZ15] 

Thibault Lesieur, Elorent Krzakala, and Lenka Zdeborova, Phase Transitions in Sparse 
PCA, arXiv:1503.00338 (2015). 

[MM09] 

Marc Mezard and Andrea Montanari, Information, Physics and Computation, Oxford, 
2009. 

[MNS13] 

Elchanan Mossel, Joe Neeman, and Allan Sly, A proof of the bloek model threshold 
conjecture, arXiv:1311.4115 (2013). 

[Mon08] 

Andrea Montanari, Estimating random variables from random sparse observations, 
Eur. Trans, on Telecom. 19 (2008), 385-403. 

[Monl5] 

Andrea Montanari, Statistical mechanics and algorithms on sparse and random graphs, 
2015, In preparation: Draft available online. 

[MPOl] 

Marc Mezard and Giorgio Parisi, The Bethe lattice spin glass revisited, The European 
Physical Journal B-Condensed Matter and Complex Systems 20 (2001), no. 2, 217- 
233. 

[MRZ14] 

Andrea Montanari, Daniel Reichman, and Ofer Zeitouni, On the limitation of spec¬ 
tral methods: From the gaussian hidden clique problem to rank one perturbations of 
gaussian tensors, arXiv: 1411.6149 (2014). 

[NisOl] 

Hidetoshi Nishimori, Statistical Physics of Spin Glasses and Information Processing: 
An Introduction, Oxford University Press, 2001. 

[RU08] 

Tom J. Richardson and Rudiger Urbanke, Modern Coding Theory, Cambridge Uni¬ 
versity Press, Cambridge, 2008. 

[RV14] 

Mustazee Rahman and Balint Virag, Local algorithms for independent sets are half- 
optimal, arXiv:1402.0485 (2014). 

[SWPN09] 

Andrey A Shabalin, Victor J Weigman, Charles M Perou, and Andrew B Nobel, 
Finding large average submatrices in high dimensional data, The Annals of Applied 
Statistics (2009), 985-1012. 


29 






[VAC 13] 

[VilOS] 

[VL07] 

[ZKll] 


Nicolas Verzelen and Ery Arias-Castro, Community detection in sparse random net¬ 
works, arXiv:1308.2955 (2013). 

Cedric Villani, Optimal transport: old and new, vol. 338, Springer, 2008. 

Ulrike Von Luxburg, A tutorial on spectral clustering. Statistics and computing 17 
(2007), no. 4, 395-416. 

Lenka Zdeborova and Florent Krzakala, Quiet planting in the locked constraint satis¬ 
faction problems, SIAM Journal on Discrete Mathematics 25 (2011), no. 2, 750-770. 


30 



